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INTRODUCTION 


This  study  has  been  undertaken  to  develop  an  Inexpen- 
sive and  short  test  for  high  school  students  which  measures 
ability  to  think  about  fairly  familiar  materials. 

The  principle  of  the  teat  used  was  first  developed  by 
J.  C.  Peterson  in  his  Verbal  Thinking  Teat  (for  college 
students).    This  test  consists  of  sixty  groups  of  four  fa- 
miliar words,  each  group  constructed  so  that  it  contains  one 
word  which  includes  at  least  one  meaning  of  each  of  the  other 
three  words*    The  problem  is  to  find  that  word  in  each  group. 

The  Verbal  Thinking  Test  is  only  partially  a  vocabulary 
test.    The  words  used  are  supposedly  familiar,  and  emphasis 
is  placed  on  the  fact  that  one  word  nay  have  more  than  one 
meaning. 

In  order  to  utilize  the  principle  of  Dr.  Peterson* s 
test  to  measure  high  school  students,  the  number  of  words  in 
each  group  was  reduced  to  three,  and  allowance  waa  made  for 
the  difference  in  vocabulary  between  high  school  and  college 
students.    The  problem  is  to  select  the  word  in  each  group 
which  Includes  a  meaning  of  each  of  the  other  two  words. 

Each  group  constitutes  a  test  item. 
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MATERIALS  AND  PROCEDURE 

There  were  constructed,  originally,  192  items,  examples 
end  arrangement  of  which  are  shown  below. 


The  plan  was  to  select  the  100  best  items  and  combine 
them  into  one  test.    In  order  to  do  this,  the  original 
items  were  divided  by  chance  Into  three  tests  of  64  items 
each.    These  tests  (called  Forms  A,  B,  and  C  merely  for 
identification.)  were  given  to  the  172  high  school  students 
of  Onaga  and  Sharon  Springs,  Kansas.    Of  these  172  students, 
54  took  Form  A,  60  took  Form  B,  and  58  took  Form  C.  Pre-* 
cautions  were  taken  to  insure  equal  opportunities,  including 
giving  the  tests  to  all  students  of  one  high  school  at  the 
sea*  time,  Identical  direction  sheets  for  all  supervisors, 
and  allowing  the  same  time,  thirty  minutes,  in  each  case. 
Since  these  tests  were  not  speed  tests,  the  time  allowed  wee 
sufficient  for  every  student  to  complete  the  test.    It  was 
found  that  20  minutes  was  sufficient  time  to  complete  the 
64  item  test. 

Forms  A,  B,  and  C  were  scored  for  the  number  of  correct 
responses.    The  scores  were  then  distributed  and  percentile 
ranks  calculated.    The  highest  25  per  cent  and  the  lowest  25 
per  cent  of  the  papers  were  used  to  determine  the  value  of 
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the  items.    The  contrast  between  these  groups  of  papers 
makes  the  differentiation  between  the  high  paper  and  the  low 
paper  more  clear  cut  than  would  the  contrast  between  the 
highest  50  per  cent  and  the  lowest  50  per  cent.    The  plan 
followed  in  determining  the  value  of  the  items  was  as  fol- 
lows.   Each  student's  response  to  each  item  was  tabulated, 
so  as  to  show  the  number  of  correct  and  incorrect  responses 
to  each  item.    The  criterion  used  to  select  the  best  items 
was  "goodness",  defined  as  the  extent  to  which  the  item 
measured  the  difference  between  the  high  paper  and  the  low 
paper.    Two  constants  were  calculated  for  each  item.  The 
first  one  was  the  number  of  correct  responses  divided  by  the 
number  of  incorrect  responses,  among  the  highest  25  per  cent 
of  the  papers.    The  second  one  was  the  number  of  correct 
responses  divided  by  the  number  of  incorrect  responses  among 
the  lowest  25  per  cent  of  the  papers'.    The  ratio  represent- 
ing the  "goodness*  of  an  item  waa  the  first  constsnt  divided 
by  the  second  constant,  or  the  ratio  showing  the  extent  to 
which  the  item  different isted  between  the  high  paper  and  the 
low  paper.    The  items  having  the  largest  ratios  were  selected 
as  the  best.    Among  the  100  best  items  these  ratios  varied 
from  infinity  (items  which  no  student  in  the  lowest  25  per 
cent  answered  correctly)  to  16.566. 

The  100  best  items  as  selected  by  "goodness"  were  then 
arranged  in  order  of  increasing  difficulty,  the  difficulty 


5 


of  an  Item  being  defined  numerically  as  the  ratio  of  the 
number  of  correct  responses  among  the  highest  25  per  cent  and 
the  "lowest  25  per  cent  of  the  papers,  to  the  total  number  of 
responses. 

These  100  items  were  combined  into  one  test  which  was 
given  to  1150  students,  371  in  Yieat  Junior  High  School,  291 
in  East  Junior  High  School,  and  488  in  the  Senior  High 
School,  of  Parsons,  Kansas,    Opportunities  were  again  equal- 
ized as  far  as  possible  by  hawing  all  tests  in  Junior  High 
Schools  given  during  the  sar.e  period,  and  all  tests  in 
senior  High  School  given  during  the  same  period.  Super- 
visors arere  given  direction  sheets  and  asked  to  follow  them 
absolutely;  this  was  evidently  done,  with  two  exceptions  to 
be  noted  later. 

Three  scores  were  recorded  for  each  paper;  namely, 
(1)  the  number  of  correct  responses  among  the  odd-numbered 
items,  (2)  the  number  of  correct  responses  among  the  even- 
numbered  items,  and  (3)  the  total  number  of  correct  responses 

The  reliabilities  were  calculated  from  the  first  two 
scores  by  the  odd-oven  method,  keeping  the  Junior  High 
schools  separate  from  the  Senior  High  School.    The  reliabili- 
ties were  stepped-up  to  the  length  of  the  test  by  the 
Spearman-Brown  prophecy  formula  (3).    The  Spearman-Brown 
formula  is  given  below. 


in  which        is  the  stepped-up  reliability,  n  is  the  number 
of  times  it  is  stepped-up ,  and         is  the  original  reliabili- 
ty coefficient, 

A  percentile  distribution  v7as  made  of  the  scores  in 
each  class;  percentile  ranks  were  found  and  converted  into 
percentile  scores. 

Mathematics  and  English  grades  were  secured  for  the 
students,  and  validity  coefficients  obtained  by  correlating 
Verbal  Thinking  scores  with  these  grades,  separately  and 
then  combined.    Four  grades  were  recorded  for  each  student- 
first  semester  Mathematics,  second  semester  Eiatheoatics, 
first  seraester  English,  and  second  seraester  English. 

Jack  Donlap  and  Edward  Cure ton  (1)  have  developed  a 
formula  for  the  correlation  coefficient  corrected  for  atten- 
uation in  the  criterion,  with  its  standard  error.    They  have 
found  that  between  a  test  Y2  and  a  criterion  J ro measured  by 
two  fallible  scores  X]_  and  X2*  the  correlation  is  found  by 
the  formula 

*      fadtjj    »  in  which  ra(l+3)  »     vi3,  +  ri3 

■  \  ^  2    +  2r13 

\|  1  ♦  rl3 

The  standard  error  of  r  ^  is; 


CTi 


2(l-rA*+jf*  (l-r^f  -  (l-r,3) 


Using  first  and  second  semester  grades  as  the  two 
measures  of  the  criterion,  a  validity  coefficient  is  secured 


which  has  been  corrected  for  attenuation  in  the  criterion. 
There  were  three  such  coefficients  found  for  each  class,  the 
correlation  between  Verbal  Thinking  and  Mathematics,  the 
correlation  between  Verbal  Thinking  and  English,  and  the 
correlation  between  Verbal  Thinking  and  combined  mathematics 
and  I";ngllsh. 

A  comparison  between  the  validity  of  boys»  and  girls' 
scores  was  nside.    These  coefficients  were  secured  by  the 
method  described  above. 

Terman  Group  Test  Intelligence  Quotients  were  avail- 
able for  most  of  the  students  in  iast  Junior  High  school.  A 
study  was  r.ade  in  each  class  of  the  extent  to  which  these 
quotients  correlated  with  grades  and  with  Verbal  Thinking 
scores.    Correlations  were  secured  between  Verbal  Thinking 
scores  and  grades  for  the  students  who  had  Terman  scores. 
Grades  were  then  correlated  with  the  Terman  and  Verbal 
Thinking  scores  by  means  of  multiple  correlations,  as  given 
b.,  Kelly* s  (5)  formula  for  finding  multiple  correlations 
when  three  variables  are  involved.    The  formula  is  as  fol- 
lows; 


tug 


1  -  r, 


in  which  Xx  is  the  criterion,  X2  is  Terman  scores,  and  X5 
is  Verbal  Thinking  Scores. 


Grade  norma  were  found  for  each  grade,  and  age  norma 
for  the  years  12  to  17.    Each  students  age  was  taken  as 
that  of  his  newest  birthday. 

Ml  of  the  errors  calculated  for  coefficients  in  this 
study  are  standard  errors. 

HESUET3  ARD  DISCUSS I OH 

The  reliability  of  the  Verbal  Thinking  lest  was  found 
to  be  .961  +  .002  for  the  Senior  High  School,  and 
.938  +  .003  for  the  Junior  High  Schools. 

The  validity  coefficients  are  shown  in  tables  1  and  2. 
These  coefficients  are  higher  when  the  criterion  used  is  the 
combined  Mathematics  and  English  grades  than  they  are  for 
either  subject  separetely.    The  validity  is  found  to  vary 
from  .378  ♦  .063  for  the  seventh  grade  to  .520      .043  for 
the  ninth  grade.    These  coefficients  could  reasonably  be 
expected  to  be  higher  for  the  composite  of  all  grades. 
Freeman  (1)  found  that  the  correlation  between  standardized 
tests  and  composite  standings  of  the  pupils  could  be  said  to 
lie  usually  between  .40  and  .60. 

A  comparison  of  the  validity  of  the  scores  made  by  boys 
with  those  made  by  girls  is  shown  in  table  3.     71 th  one  ex- 
ception, the  girls »  scores  show  markedly  higher  correlations. 

The  comparison  of  Terman  Intelligence  Quotients  and 
Veriml  Thinking  scores,  as  shorn  in  table  4,  must  necessarily 
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be  limited,  for  the  numbers  were  sr*.all.    There  were  76  in 
the  seventh,  71  in  the  eighth,  and  54  in  the  ninth,  grade. 

The  coefficients  show  that  the  Tennan  Intelligence 
Quotients  correlate  with  grades  to  a  much  greater  extent 
than  do  Verbal  Thinking  scores.    It  must  be  remembered, 
however,  that  the  Terman  scores  were  available  before  the 
grades  were  given  and  therefore  the  relationship  raay  be  to 
some  extent  causal.    That  fact  nay  account  for  the  high 
validity  shown  bj  the  To  man  Oroup  Test  for  those  students. 
Verbal  Thinking  scores  were  not  available  until  after  the 
grades  were  given. 

The  correlations  between  Terman  scores  and  Verbal 
Thinking  scores  vary  from  .473  ^+  .089  to  .554  ^  .094.  There- 
fore, the  tests  do  not  measure  all  of  the  same  abilities. 
That  they  supplement  each  other  is  shown  by  the  multiple 
correlations  between  grades  and  the  combined  Terman  and 
Verbal  Thinking  scores,  the  multiples  being  higher  than  the 
validity  coefficients  for  either  test. 

The  reliability  of  the  Terman  (Jroup  Test  is  given  by 
Kelley  (4)  as  .89  for  the  ninth  grade,  to  compare  with  a 
reliability  of  .938  for  the  Verbal  Thinking  Test  in  the 
Junior  High  school. 

The  norms,  with  the  number  of  students  they  were  based 
upon  are  shown  in  table  5. 


Table  5.    Age  and  Grade  Norms 


Norm 


•Trade 


I 


II 
13 
14 
15 
16 
17 


43.51 
45.01 
47.34 
4t.fl 
49.94 
52.10 


seventh 
Ninth 


Sophomore 


Junior 
senior 


219 
195 
248 
183 
150 
155 


40.10 
4'  .45 
■in.  58 
53.88 
51.24 
59.94 


These  moras  show  a  continual  increase  for  all  ages,  and 
for  all  grades  except  the  Junior.    In  this  case,  exceptions 
from  the  prescribed  procedure  in  the  administration  of  the 
test  were  found  which  might  explain  the  lower  average.  The 
exceptions  were  failures  to  explain  the  examples  given  on 
the  direction  sheet. 


The  Verbal  Thinking  Test  for  High  School  Students  was 
designed  originally  for  the  use  of  Senior  High  school  stu- 
dents.   A  comparison  shows  that  the  reliability  and  validity 
of  the  test  are  higher  for  senior  High  School  than  for 
Junior  High  School. 

The  high  reliability  and  the  progressively  higher  norms 
indicate  that  the  Verbal  Thinking  Test  possesses  the  essen- 
tials of  a  good  test.    The  validity  coefficients  are  no 
lower  than  those  usually  found.    The  grades  used  as  a  cri- 
terion were  not  normally  distributed,  as  evidenced  by  the 
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fact  that  in  some  cases  there  vara  as  many  as  29  A  grades 
where  there  were  only  two  failures.    Such  faulty  distribu- 
tions make  grades  a  very  unsatisfactory  criterion. 

The  Verbal  Thinking  Test  is  not  intended  to  be  s 
general  intelligence  test*    It  is  intended  to  be  a  test  of 
ability  to  think.    It  is  wary  possible  that  this  ability  is 
not  included  in  the  awarding  of  grades. 

This  study  has  shown  that  it  will  be  worth  while  to 
validate  the  Verbal  Thinking  Test  on  the  basis  of  such  a 
criterion  as  pooled  judgments  of  that  ability  which  it  is 
designed  to  measure.    This  will  necessitate  selection  of  com- 
petent judges  and  the  formulation  of  a  basis  for  measuring 
students  as  to  their  ability  to  think. 

The  Verbal  Thinking  Test  will  again  be  revised  by  means 
of  the  data  secured  frost  the  Parsons  students.    The  revised 
teat  will  consist  of  approximately  60  items,  and  the  time 
allowed  will  be  20  minutes.    The  teat  will  then  be  stan- 
dardised on  high  achool  atudenta. 
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